Background, what’s out there (visualization tools,) why this is useful (because there are not that many detailed examples showing the code, talk about your experience in Sunbelt “what’s the format of the data,” look for papers talking about computing literacy) and our goal (start to finish network visualization: load the data, process it a little bit, and plot it).
One of the important aspects of network visualization is the layout algorithm used to actually graph the networks. There are a number of popular graphing algorithms, and each of these have their own strengths and weaknesses. The “Circle” algorithm places clusters on a circle and creates straight lines between vertices (Six and Tollis 1999). It works well for showing bi-connectivity and subnetworks, and the number of edges needs to be relatively low to effectively show connections (Six and Tollis 1999). The DrL (Distributed Recursive Layout) employs a multilevel force algorithm that is based on simulated annealing, and it works well for large abstract datasets (Martin, Brown, and Wylie 2007). The Fruchterman-Reingold layout algorithm uses vertices and edges as atomic bodies with repulsive and attractive forces to minimize the energy of the system (Fruchterman and Reingold 1991) (Hansen, Shneiderman, and Smith 2011), which benefits large social networks. The Kamada Kawai layout uses a spring algorithm to layout undirected graphs in a symmetric drawing with a minimum number of edge crossings, which works well for network structures (Kamada and Kawai 1989). The LGL (Large-Graph Layout) is based on a mass-spring algorithm, using edges as springs to pull vertices together while a repulsive force prevents overlapping, making it highly effective for dense biological networks (Adai et al. 2004).
Another aspect that must be considered is the graphing parameters. One aspect of graphing is vertex size, which can effectively convey varying weights and changes in information, as demonstrated by Zien, Schlag, and Chan (1999) and the study of Sharma and Chou (2022) where vertex size is determined by the number of outgoing edges. Additionally, vertex size can be employed to illustrate increases or decreases in data, such as counts, as observed by Knisley and Knisley (2014). The color of vertices is equally significant, not only for enhancing visual appeal but also for aiding in the differentiation of objects or levels (Ognyanova, n.d.). Furthermore, vertex color can assist in visualizing groupings, patterns, or clusters (Tyner, Briatte, and Hofmann 2017). Just like vertex color, the shape of vertices contributes to both aesthetic appeal and data distinction, and Grapov and Newman (2012) exemplify how a combination of vertex shape, size, and color can effectively differentiate different data points. Finally, the width of edges plays a vital role in displaying the strength of connections between vertices, allowing users to comprehend the varying degrees of connection intensity. Lin (2018) provides an example where edge widths are proportional to other measured aspects in the study. By thoughtfully considering these diverse components and utilizing them skillfully, network visualization can become a powerful tool for users to better understand intricate relationships within their data.
The type of data needs to be taken into consideration as well. Egocentric data encompasses diverse types of social network measurements, including degrees of mesh (Barnes 1954), level of knittedness (Bott 2002), local or cosmopolitan orientation (Merton 1968), strength of ties (Granovetter 1973), family or friend networks (Wellman 1979), and more. These measurements revolve around the social relationships surrounding a central individual’s immediate context, offering insights into their social status and the flow of information, support, or resources (Marsden and Hollstein 2023). Examples of its applications range from health-related topics (Burgette et al. 2021) and social behaviors (Carrasco, 2008) to ecological data (Mascareno, 2020) and beyond. Additionally, network analysis involves small networks that exhibit high clustering and short characteristic path lengths, such as those found in medical aspects like brain networks (Bassett, 2006), location aspects like electric power grids or airport connections (Amaral, 2000), and social connections (Newman, 2000), among others. On the other end of the spectrum, large networks comprise billions of nodes and edges, capturing connections within a community and include examples like social media platforms, mobile phone networks, and website connections (Blondel, 2008). Furthermore, bipartite networks, which model relationships between two distinct sets of entities, find applications in various fields, including microbiology topics (Corel, 2018), plant-animal mutualistic networks (Jordano), and artistic collaboration networks (Uzzi, 2005), among others (Banerjee, 2017). Understanding these different types of data and their applications provides valuable insights into the complexities of interconnected systems.
In this section, we will present two full-length examples of network visualization. In both, we will start with raw data sets, walking through how to read and process the data and how to build a visualization step by step. Throughout the paper, we will use the igraph (Csárdi et al. 2023) (Csardi and Nepusz 2006), data.table (Dowle and Srinivasan 2023), and netplot (Vega Yon and Bischoff 2023) R packages. We start by loading those packages
library(igraph)
library(data.table)
library(devtools)
#install_github("USCCANA/netplot")
library(netplot)
For the first example, we will use a data set from the paper titled “Estimates of Social Contact in a Middle School Based on Self-Report and Wireless Sensor Data” by Leecaster et al., which features the social networks of 7th and 8th-grade students. We have identifiers such as gender, lunch period, and grade, which we will use for building our visualization.
First, the data needs to be pulled in. After we pull it in, let’s glimpse what the data looks like.
# loading and cleaning data
students <- fread("./data/middle_school/pone.0153690.s001.csv")
interactions <- fread("./data/middle_school/pone.0153690.s003.csv")
head(students)
## id grade gender unique lunch initialsNum
## 1: 2003 7 0 0 1 386
## 2: 2004 8 1 1 1 402
## 3: 2006 7 1 1 2 288
## 4: 2008 8 0 1 1 199
## 5: 2009 7 1 0 1 147
## 6: 2010 8 1 0 1 157
head(interactions)
## id contactGender contactGrade contactId ClassPeriod contactInitialNum
## 1: 2004 1 8 3127 4 323
## 2: 2004 0 8 2620 1 335
## 3: 2004 1 8 99 1 401
## 4: 2004 1 8 99 9 401
## 5: 2004 1 8 99 9 401
## 6: 2004 1 8 99 9 401
In order to use the data, we need to remove all of the ’N/A’s and miscoding in the datasets. Also, we see a large number of students who only have interactions with themselves (they do not interact with anyone else through the day), so these “isolates” need to be removed in order for the graph to be more easily read.
# filtering out 'N/A's in the 'students' data frame
students <- students[!is.na(id)]
# filtering down to gender being "0" or "1"
students <- students[gender %in% c("0", "1")]
# filter out 'N/A's in 'id' and 'contactId'
interactions <- interactions[!is.na(id) & !is.na(contactId)]
# Which connections are not OK?
ids <- sort(unique(students$id))
# narrowed our data from 10781 to 5150
interactions <- interactions[(id %in% ids) & (contactId %in% ids)]
source(file = "./misc/color_nodes_function.R")
After, the two datasets need to be combined together.
## Creating matrix from datasets
net <- graph_from_data_frame(
d = interactions[, .(id, contactId)],
directed = FALSE, vertices = as.data.frame(students)
)
## Getting only connected individuals
net_with_no_isolates <- induced_subgraph(net, which(degree(net) > 0))
Finally, we plot it, effectively showing this network graph.
## Plot with no isolates
set.seed(3)
nplot(
net_with_no_isolates
)
Here, we are taking the data set and the plot, letting us customize a number of aspects of the graph. First, in order to work with the “color_nodes” function, we need to make “grade” a factor instead of being numeric. Also, we identify the colors we would like the nodes to be.
## adjust 'grade' to factor
V(net_with_no_isolates)$grade <- as.factor(V(net_with_no_isolates)$grade)
# plotting connections among grades ####
set.seed(3)
a_colors <- color_nodes(net_with_no_isolates,"grade", c("gray40","red3"))
attr(a_colors, "map")
## 7 8
## "#666666" "#CD0000"
Now, we are able to create a plot of the data. This is the same data that we used to create the plot above, but now adjustments to the nodes will be made.
Color the vertices (‘vertex.color’) according to the grade the student is in (with 7th graders being gray and 8th graders being red).
Adjust the shape of the vertices (‘vertex.nsides’). If the student is a 7th grader, the vertices will be a circle, but if they are not, the vertices will be a triangle.
Adjust size of vertices (‘vertex.size.range’).
Remove the labels of the nodes.
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.color = color_nodes(net_with_no_isolates, "grade", c("gray40","red3")),
vertex.nsides = ifelse(V(net_with_no_isolates)$grade == 7, 10, 3),
vertex.size.range = c(0.015, 0.020),
vertex.label = NULL)
print(grades)
This looks good, but lets alternate these parameters we just gave to make things have a different look.
Change vertex.colors to be tied to a color palette.
Adjust vertex.nsides to make 7th graders be an octagon and 8th graders be a hexagon.
Adjust vertex.size.range, making each vertex smaller.
Add and adjust labels of vertices with functions vertex.label.[specific_function]
vertex.label.fontsize adjust the font size
vertex.label.show adjusts proportion of labels to keep.
Adjust vertex.frame.color to give an outline of each vertex.
library(igraph)
library(RColorBrewer)
# Create a color palette using RColorBrewer
palette <- brewer.pal(3, "Set1") # Change the number and palette name as needed
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.color = color_nodes(net_with_no_isolates, "grade", palette),
vertex.nsides = ifelse(V(net_with_no_isolates)$grade == 7, 8, 6),
vertex.size.range = c(0.01, 0.011),
vertex.label.fontsize = 10,
vertex.label.show = .25,
vertex.frame.color = "black")
print(grades)
Now that we have explored a bit about vertices, let’s dive into options related to edges.
Change edge.width.range to make the size of the edges wider or thinner.
Change edge.color to blue.
Change edge.color.alpha to adjust transparency.
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.label=NULL,
edge.width.range = c(.25,1),
edge.color = "dodgerblue4",
edge.color.alpha = .33)
print(grades)
Now, let’s adjust everything again, showing some of the things that netplot can do with edges.
Adjust edge.color so that edges correspond to vertices on a gradient.
Adjust edge.curvature to make edges a straight line.
Adjust edge.line.lty to make edges long dashes.
set.seed(3)
grades <- nplot(
net_with_no_isolates,
vertex.label=NULL,
edge.width.range = c(1,1),
vertex.color = color_nodes(net_with_no_isolates, "grade", c("blue","red3")),
edge.color = ~ego(alpha = 0.5) + alter(alpha = 0.5),
edge.curvature = 0,
edge.line.lty = 5)
print(grades)
Using the same plot that we originally created, we can also adjust some of the aspects outside of vertices and edges.
Adjust bg.col to make background color slate gray.
Adjust sample.edges to select a proportion of the edges.
## Plot with no isolates
set.seed(3)
nplot(
net_with_no_isolates,
vertex.label=NULL,
bg.col = "slategray1",
sample.edges = .5)
We can adjust things to get a different outcome.
Adjust skip.edges to remove edges altogether.
Adjust bg.col to misty rose.
Adjust zero.margins to true.
## Plot with no isolates
set.seed(3)
nplot(
net_with_no_isolates,
vertex.label=NULL,
skip.edges = TRUE,
bg.col = "mistyrose",
zero.margins = TRUE
)
The middle school data set provides a basis where we can see what netplot can do. There are options to adjust the vertices, edges, and even other parameters.
This data set comes from “Assessing Pathogen Transmission Opportunities: Variation in Nursing Home Staff-Resident Interactions” by Chang et. al. It explores connections in a number of nursing homes across 7 states between patients and healthcare providers. There are 99 networks in the data set.
With this data, we will explore how multiple smaller networks can work together to tell a story and can be plotted using netplot.
First, the data needs to be loaded in, with the requisite packages we will be using.
# attaching packages
library(network)
data <- load("./data/nursing_home/network99_f1.RData")
Following, we are now ready to plot the data, as it is already in the correct, cleaned format. First, let’s pull the first and the second networks alone so we can have a closer look at them.
# Creates an empty list to store the networks
nets <- list()
# Sets a seed for reproducibility
set.seed(1231)
for (i in 1:2) { # Change the loop range to 1:2
# Checks if the vertex "is_actor" exists in the network
is_health_care_provider <- networks[[i]] %v% "is_actor"
nets[[i]] <- nplot(
networks[[i]],
# Colors the vertices gray if HCP exists, red otherwise
vertex.color = ifelse(is_health_care_provider, "gray40", "red3"),
# Makes vertices square if HCP exists, round otherwise
vertex.nsides = ifelse(is_health_care_provider, 4, 10),
# Makes HCP vertices larger than patient vertices
vertex.size = ifelse(is_health_care_provider, .25, .15),
vertex.size.range = c(.015, .065),
edge.width.range = c(.25, .5),
# Sets edge line breaks to 1 and colors edges black
edge.line.breaks = 1,
edge.color = ~ego(alpha = 1, col = "lightgray") + alter(alpha = 1, col = "lightgray"),
edge.curvature = pi / 6,
# Removes vertex labels
vertex.label = NULL
)
}
# Combines the 2 plots into a 1x2 grid
allgraphs <- gridExtra::grid.arrange(grobs = nets, nrow = 1, ncol = 2)
Here, the healthcare provider is represented by gray diamonds, while the patients are represented by red circles.
Much like the previous example, we can use the different aspects of netplot to adjust how the graph looks.
Adjust vertex.color so providers are purple instead of gray and patients are pink instead of red.
Adjust vertex.nsides so providers are triangles and patients are hexagons.
Adjust edge.line.breaksto make the edges curved instead of straight.
Adjust edge.color so edges are now black instead of gray.
alpha so the black is slightly transparent.Adjust edge.curvature to make the edges more curved.
# Creates an empty list to store the networks
nets <- list()
# Sets a seed for reproducibility
set.seed(1231)
for (i in 1:2) { # Change the loop range to 1:2
# Checks if the vertex "is_actor" exists in the network
is_health_care_provider <- networks[[i]] %v% "is_actor"
nets[[i]] <- nplot(
networks[[i]],
# Colors the vertices gray if HCP exists, red otherwise
vertex.color = ifelse(is_health_care_provider, "purple", "pink"),
# Makes vertices square if HCP exists, round otherwise
vertex.nsides = ifelse(is_health_care_provider == TRUE, 3, 6),
# Makes HCP vertices larger than patient vertices
vertex.size = ifelse(is_health_care_provider == TRUE, .25, .15),
vertex.size.range = c(.015, .065),
edge.width.range = c(.25, .5),
# Sets edge line breaks to 1 and colors edges black
edge.line.breaks = 6,
edge.color = ~ego(alpha = .8, col = "black") + alter(alpha = .8, col = "black"),
edge.curvature = pi / 3,
# Removes vertex labels
vertex.label = NULL
)
}
# Combines the 2 plots into a 1x2 grid
allgraphs <- gridExtra::grid.arrange(grobs = nets, nrow = 1, ncol = 2)
Now that we understand what these networks look like at a closer level, we can plot them all for comparison.
# Creates an empty list to store the networks
nets <- list()
# Sets a seed for reproducibility
set.seed(1231)
for (i in 1:99) {
# Checks if the vertex "is_actor" exists in the network
is_health_care_provider <- networks[[i]] %v% "is_actor"
nets[[i]] <- nplot( networks[[i]],
# Colors the vertices gray if HCP exists, red otherwise
vertex.color = ifelse(is_health_care_provider, "gray40", "red3"),
# Makes vertices square if HCP exists, round otherwise
vertex.nsides = ifelse(is_health_care_provider == TRUE, 4, 10),
# Makes HCP vertices larger than patient vertices
vertex.size = ifelse(is_health_care_provider == TRUE, .25, .15),
vertex.size.range = c(.015,.065),
edge.width.range = c(.25,.5),
# Sets edge line breaks to 1 and colors edges black
edge.line.breaks = 1, edge.color = ~ ego(alpha = 1, col = "lightgray") + alter(alpha = 1, col = "lightgray"),
edge.curvature = pi/6,
# Removes vertex labels
vertex.label = NULL )
}
# Combines the 99 plots into an 11x9 grid
allgraphs <- gridExtra::grid.arrange(grobs = nets, nrow=11, ncol=9)
As made evident, netplot can be used in a “For-Loop,” creating a large number of graphs with large amounts of data in a very quick manner. Having these graphs side-by-side allows for quick and easy analysis of the similarities and differences.